Glass is a fully recyclable and sustainable material used in the fabrication of everything from tableware or windows for the automotive and the construction industries to more technical glass, such as flat screens for smartphones. Demand for glass is on the rise due to population and infrastructure growth. Glass is made in furnaces at very high temperatures with a process that requires constant improvement to meet the energy efficiency and environmental challenges posed by the manufacturing cycle.
Enhancing efficiency with oxygen Fuel technology The glass industry is under severe cost constraints and increasing environmental pressure to reduce emissions. Air Liquide provides glass manufacturers with solutions to improve their competitiveness and environmental footprint.
As long standing experts in oxy-combustion, which is the process of burning a fuel using pure oxygen instead of air as the primary oxidant, we have substantial R&D resources and a strong experience in glass making
Before glass can even be formed, raw materials, mainly sand, and recycled glass must be melted at extremely high temperatures (1,400°C). To achieve the heat intensity required in the glass furnaces, Air Liquide steps in with oxy-combustion technologies that replace air with oxygen, thus improving the melting process, reducing air pollutant emissions and saving fuel. Our Nexelia™ all-in-one solutions allow for rapid alternation of melting temperatures, for operational flexibility and for automatic temperature control in a wide range of glass furnace sizes.
Before a finished product can be packaged and put on the market, defaults must be removed. Air Liquide supplies oxy-natural based gas and oxy-hydrogen based combustion technologies to polish the product surface and remove the defaults present in glass. This gives products, including fine items such as tableware, perfume bottles and crystal glasses, a pleasing, smooth and shiny appearance.
Air Liquide provides glass insulation solutions that involve injecting rare gases such as argon, xenon and krypton into multiple-glazed windows. Not only does this improve building acoustics, it also substantially increases energy efficiency, helping protect the environment
import pandas as pd
import numpy as np
# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import pandas_profiling as pp
# Data Transformation
from sklearn.base import TransformerMixin
from scipy.stats import boxcox
#Preprocessing
from sklearn.preprocessing import FunctionTransformer, StandardScaler
# Dimentionality Reduction
from sklearn.decomposition import PCA
from sklearn. discriminant_analysis import LinearDiscriminantAnalysis as LDA
# Train Test Split Model Selection
from sklearn.model_selection import train_test_split, KFold, StratifiedKFold, cross_val_score, GridSearchCV, learning_curve, validation_curve
# Streaming Pipelines
from sklearn.pipeline import Pipeline
# Boxcox transformation class
from sklearn.base import BaseEstimator, TransformerMixin
from collections import Counter
import warnings
# Models
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier, plot_importance
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, ExtraTreesClassifier, GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from time import time
%matplotlib inline
warnings.filterwarnings('ignore')
sns.set_style('whitegrid')
data = pd.read_csv("C:\\Users\\SHASHI\\OneDrive\\Desktop\\Python\\glass.csv")
data.head(5)
| RI | Na | Mg | Al | Si | K | Ca | Ba | Fe | Type | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.52101 | 13.64 | 4.49 | 1.10 | 71.78 | 0.06 | 8.75 | 0.0 | 0.0 | 1 |
| 1 | 1.51761 | 13.89 | 3.60 | 1.36 | 72.73 | 0.48 | 7.83 | 0.0 | 0.0 | 1 |
| 2 | 1.51618 | 13.53 | 3.55 | 1.54 | 72.99 | 0.39 | 7.78 | 0.0 | 0.0 | 1 |
| 3 | 1.51766 | 13.21 | 3.69 | 1.29 | 72.61 | 0.57 | 8.22 | 0.0 | 0.0 | 1 |
| 4 | 1.51742 | 13.27 | 3.62 | 1.24 | 73.08 | 0.55 | 8.07 | 0.0 | 0.0 | 1 |
data.shape
(214, 10)
data.dtypes
RI float64 Na float64 Mg float64 Al float64 Si float64 K float64 Ca float64 Ba float64 Fe float64 Type int64 dtype: object
data.describe()
| RI | Na | Mg | Al | Si | K | Ca | Ba | Fe | Type | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 214.000000 | 214.000000 | 214.000000 | 214.000000 | 214.000000 | 214.000000 | 214.000000 | 214.000000 | 214.000000 | 214.000000 |
| mean | 1.518365 | 13.407850 | 2.684533 | 1.444907 | 72.650935 | 0.497056 | 8.956963 | 0.175047 | 0.057009 | 2.780374 |
| std | 0.003037 | 0.816604 | 1.442408 | 0.499270 | 0.774546 | 0.652192 | 1.423153 | 0.497219 | 0.097439 | 2.103739 |
| min | 1.511150 | 10.730000 | 0.000000 | 0.290000 | 69.810000 | 0.000000 | 5.430000 | 0.000000 | 0.000000 | 1.000000 |
| 25% | 1.516523 | 12.907500 | 2.115000 | 1.190000 | 72.280000 | 0.122500 | 8.240000 | 0.000000 | 0.000000 | 1.000000 |
| 50% | 1.517680 | 13.300000 | 3.480000 | 1.360000 | 72.790000 | 0.555000 | 8.600000 | 0.000000 | 0.000000 | 2.000000 |
| 75% | 1.519157 | 13.825000 | 3.600000 | 1.630000 | 73.087500 | 0.610000 | 9.172500 | 0.000000 | 0.100000 | 3.000000 |
| max | 1.533930 | 17.380000 | 4.490000 | 3.500000 | 75.410000 | 6.210000 | 16.190000 | 3.150000 | 0.510000 | 7.000000 |
data.isnull().sum()
RI 0 Na 0 Mg 0 Al 0 Si 0 K 0 Ca 0 Ba 0 Fe 0 Type 0 dtype: int64
data['Type'].value_counts()
2 76 1 70 7 29 3 17 5 13 6 9 Name: Type, dtype: int64
data['Type'].value_counts().sort_index(ascending=True)
1 70 2 76 3 17 5 13 6 9 7 29 Name: Type, dtype: int64
plt.figure(figsize=(16,8))
sns.countplot(x='Type', data=data, order=data['Type'].value_counts().index);
plt.figure(figsize=(20,10))
sns.boxplot(data=data, orient="h")
<AxesSubplot:>
plt.figure(figsize=(20,10))
sns.heatmap(data.corr(method='pearson'), cbar=False, annot=True, fmt='.1f', linewidths=0.20, cmap='Set1')
<AxesSubplot:>
pp.ProfileReport(data)
data['Type'].value_counts()
data['Type'].value_counts()*100/len(data)
sns.countplot(x='Type', data=data, palette='dark')
<AxesSubplot:xlabel='Type', ylabel='count'>
X = data.drop('Type',axis=1)
y = data['Type']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=27)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(171, 9) (43, 9) (171,) (43,)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
def models(X_train,y_train):
#Using Logistic Regression Algorithm to the Training Set
from sklearn.linear_model import LogisticRegression
log = LogisticRegression(random_state = 0)
log.fit(X_train, y_train)
#Using KNeighborsClassifier Method of neighbors class to use Nearest Neighbor algorithm
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
knn.fit(X_train, y_train)
#Using SVC method of svm class to use Support Vector Machine Algorithm
from sklearn.svm import SVC
svc_lin = SVC(kernel = 'linear', random_state =0)
svc_lin.fit(X_train, y_train)
#Using SVC method of svm class to use Kernel SVM Algorithm
from sklearn.svm import SVC
svc_rbf = SVC(kernel = 'rbf', random_state = 0)
svc_rbf.fit(X_train, y_train)
#Using GaussianNB method of naïve_bayes class to use Naïve Bayes Algorithm
from sklearn.naive_bayes import GaussianNB
gauss = GaussianNB()
gauss.fit(X_train, y_train)
#Using RandomForestClassifier method of ensemble class to use Random Forest Classification algorithm
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(n_estimators = 100, criterion = 'entropy', random_state = 0)
forest.fit(X_train, y_train)
#Using DecisionTreeClassifier of tree class to use Decision Tree Algorithm
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
tree.fit(X_train, y_train)
#Using xgboostClassifier of tree class to use Decision Tree Algorithm
from xgboost import XGBClassifier
xgboost = XGBClassifier(max_depth=5, learning_rate=0.01, n_estimators=100, gamma=0,
min_child_weight=1, subsample=0.8, colsample_bytree=0.8, reg_alpha=0.005)
xgboost.fit(X_train, y_train)
#Using SGDClassifierr of tree class to use Decision Tree Algorithm
from sklearn.linear_model import SGDClassifier
SGD = SGDClassifier()
SGD.fit(X_train, y_train)
#Using AdaBoostClassifier of tree class to use Decision Tree Algorithm
from sklearn.ensemble import AdaBoostClassifier
Ada = AdaBoostClassifier(n_estimators=2000, random_state = 0)
Ada.fit(X_train, y_train)
#Using GradientBoostingClassifier of tree class to use Decision Tree Algorithm
from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1)
clf.fit(X_train, y_train)
#####
#Using Quadratic Discriminant Analysis of tree class to use Decision Tree Algorithm
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
QDA = QuadraticDiscriminantAnalysis ()
QDA.fit(X_train, y_train)
#print model accuracy on the training data.
print('[0]Logistic Regression Training Accuracy:', log.score(X_train, y_train)*100)
print('[1]K Nearest Neighbor Training Accuracy:', knn.score(X_train, y_train)*100)
print('[2]Support Vector Machine (Linear Classifier) Training Accuracy:', svc_lin.score(X_train, y_train)*100)
print('[3]Support Vector Machine (RBF Classifier) Training Accuracy:', svc_rbf.score(X_train, y_train)*100)
print('[4]Gaussian Naive Bayes Training Accuracy:', gauss.score(X_train, y_train)*100)
print('[5]Decision Tree Classifier Training Accuracy:', tree.score(X_train, y_train)*100)
print('[6]Random Forest Classifier Training Accuracy:', forest.score(X_train, y_train)*100)
print('[7]Xgboost Classifier Training Accuracy:', xgboost.score(X_train, y_train)*100)
print('[8]SGD Classifier Training Accuracy:', SGD.score(X_train, y_train)*100)
print('[9]AdaBoost Classifier Training Accuracy:', Ada.score(X_train, y_train)*100)
print('[10]GradientBoosting Classifier Training Accuracy:', clf.score(X_train, y_train)*100)
print('[11]Quadratic Discriminant AnalysisTraining Accuracy:', QDA.score(X_train, y_train)*100)
return log, knn, svc_lin, svc_rbf, gauss,tree,forest,xgboost,SGD,Ada,clf,QDA
model = models(X_train,y_train)
[20:42:02] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [0]Logistic Regression Training Accuracy: 59.06432748538012 [1]K Nearest Neighbor Training Accuracy: 72.51461988304094 [2]Support Vector Machine (Linear Classifier) Training Accuracy: 60.23391812865497 [3]Support Vector Machine (RBF Classifier) Training Accuracy: 67.2514619883041 [4]Gaussian Naive Bayes Training Accuracy: 50.877192982456144 [5]Decision Tree Classifier Training Accuracy: 100.0 [6]Random Forest Classifier Training Accuracy: 100.0 [7]Xgboost Classifier Training Accuracy: 81.28654970760235 [8]SGD Classifier Training Accuracy: 46.783625730994146 [9]AdaBoost Classifier Training Accuracy: 57.89473684210527 [10]GradientBoosting Classifier Training Accuracy: 74.85380116959064 [11]Quadratic Discriminant AnalysisTraining Accuracy: 57.89473684210527
from sklearn.metrics import confusion_matrix
for i in range(len(model)):
cm = confusion_matrix(y_test, model[i].predict(X_test))
TN = cm[0][0]
TP = cm[1][1]
FN = cm[1][0]
FP = cm[0][1]
print(cm)
print('Model[{}] Testing Accuracy = "{}!"'.format(i, (TP + TN) / (TP + TN + FN + FP)))
print()# Print a new line
#Show other ways to get the classification accuracy & other metrics
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
for i in range(len(model)):
print('Model ',i)
#Check precision, recall, f1-score
print( classification_report(y_test, model[i].predict(X_test)) )
#Another way to get the models accuracy on the test data
print( accuracy_score(y_test, model[i].predict(X_test)))
print()#Print a new line
[[ 8 1 0 0 0 0]
[13 5 0 0 0 1]
[ 2 3 0 0 0 0]
[ 0 2 0 0 0 0]
[ 0 2 0 0 0 0]
[ 0 0 0 0 0 6]]
Model[0] Testing Accuracy = "0.48148148148148145!"
[[ 7 2 0 0 0 0]
[ 8 11 0 0 0 0]
[ 3 2 0 0 0 0]
[ 0 1 0 0 1 0]
[ 0 1 0 0 1 0]
[ 0 0 0 0 0 6]]
Model[1] Testing Accuracy = "0.6428571428571429!"
[[ 8 1 0 0 0 0]
[12 6 0 1 0 0]
[ 2 3 0 0 0 0]
[ 0 1 0 1 0 0]
[ 0 1 0 0 0 1]
[ 0 0 0 1 0 5]]
Model[2] Testing Accuracy = "0.5185185185185185!"
[[ 9 0 0 0 0 0]
[ 9 10 0 0 0 0]
[ 4 1 0 0 0 0]
[ 0 1 0 0 1 0]
[ 0 2 0 0 0 0]
[ 0 0 0 0 1 5]]
Model[3] Testing Accuracy = "0.6785714285714286!"
[[ 8 1 0 0 0 0]
[17 1 0 1 0 0]
[ 4 1 0 0 0 0]
[ 0 2 0 0 0 0]
[ 0 2 0 0 0 0]
[ 0 0 0 0 0 6]]
Model[4] Testing Accuracy = "0.3333333333333333!"
[[ 9 0 0 0 0 0]
[ 7 11 1 0 0 0]
[ 2 2 1 0 0 0]
[ 0 0 0 1 1 0]
[ 0 0 0 1 0 1]
[ 0 0 0 0 1 5]]
Model[5] Testing Accuracy = "0.7407407407407407!"
[[ 9 0 0 0 0 0]
[ 6 13 0 0 0 0]
[ 3 2 0 0 0 0]
[ 0 1 0 0 1 0]
[ 0 0 0 1 0 1]
[ 0 0 0 0 1 5]]
Model[6] Testing Accuracy = "0.7857142857142857!"
[[ 8 1 0 0 0 0]
[ 3 16 0 0 0 0]
[ 4 1 0 0 0 0]
[ 0 2 0 0 0 0]
[ 1 0 0 0 0 1]
[ 0 1 0 0 0 5]]
Model[7] Testing Accuracy = "0.8571428571428571!"
[[ 0 9 0 0 0 0]
[ 1 17 0 0 0 1]
[ 1 4 0 0 0 0]
[ 0 1 0 0 0 1]
[ 0 1 0 0 0 1]
[ 0 0 0 0 0 6]]
Model[8] Testing Accuracy = "0.6296296296296297!"
[[ 9 0 0 0 0 0]
[17 2 0 0 0 0]
[ 5 0 0 0 0 0]
[ 0 0 0 2 0 0]
[ 0 0 0 1 0 1]
[ 0 0 0 1 0 5]]
Model[9] Testing Accuracy = "0.39285714285714285!"
[[ 7 0 2 0 0 0]
[ 7 10 0 2 0 0]
[ 3 0 2 0 0 0]
[ 0 0 0 2 0 0]
[ 0 0 0 1 0 1]
[ 0 0 0 2 0 4]]
Model[10] Testing Accuracy = "0.7083333333333334!"
[[ 9 0 0 0 0 0]
[17 1 0 1 0 0]
[ 5 0 0 0 0 0]
[ 0 0 0 1 1 0]
[ 0 1 0 0 0 1]
[ 0 0 0 0 0 6]]
Model[11] Testing Accuracy = "0.37037037037037035!"
Model 0
precision recall f1-score support
1 0.35 0.89 0.50 9
2 0.38 0.26 0.31 19
3 0.00 0.00 0.00 5
5 0.00 0.00 0.00 2
6 0.00 0.00 0.00 2
7 0.86 1.00 0.92 6
accuracy 0.44 43
macro avg 0.26 0.36 0.29 43
weighted avg 0.36 0.44 0.37 43
0.4418604651162791
Model 1
precision recall f1-score support
1 0.39 0.78 0.52 9
2 0.65 0.58 0.61 19
3 0.00 0.00 0.00 5
5 0.00 0.00 0.00 2
6 0.50 0.50 0.50 2
7 1.00 1.00 1.00 6
accuracy 0.58 43
macro avg 0.42 0.48 0.44 43
weighted avg 0.53 0.58 0.54 43
0.5813953488372093
Model 2
precision recall f1-score support
1 0.36 0.89 0.52 9
2 0.50 0.32 0.39 19
3 0.00 0.00 0.00 5
5 0.33 0.50 0.40 2
6 0.00 0.00 0.00 2
7 0.83 0.83 0.83 6
accuracy 0.47 43
macro avg 0.34 0.42 0.36 43
weighted avg 0.43 0.47 0.41 43
0.46511627906976744
Model 3
precision recall f1-score support
1 0.41 1.00 0.58 9
2 0.71 0.53 0.61 19
3 0.00 0.00 0.00 5
5 0.00 0.00 0.00 2
6 0.00 0.00 0.00 2
7 1.00 0.83 0.91 6
accuracy 0.56 43
macro avg 0.35 0.39 0.35 43
weighted avg 0.54 0.56 0.52 43
0.5581395348837209
Model 4
precision recall f1-score support
1 0.28 0.89 0.42 9
2 0.14 0.05 0.08 19
3 0.00 0.00 0.00 5
5 0.00 0.00 0.00 2
6 0.00 0.00 0.00 2
7 1.00 1.00 1.00 6
accuracy 0.35 43
macro avg 0.24 0.32 0.25 43
weighted avg 0.26 0.35 0.26 43
0.3488372093023256
Model 5
precision recall f1-score support
1 0.50 1.00 0.67 9
2 0.85 0.58 0.69 19
3 0.50 0.20 0.29 5
5 0.50 0.50 0.50 2
6 0.00 0.00 0.00 2
7 0.83 0.83 0.83 6
accuracy 0.63 43
macro avg 0.53 0.52 0.50 43
weighted avg 0.68 0.63 0.62 43
0.627906976744186
Model 6
precision recall f1-score support
1 0.50 1.00 0.67 9
2 0.81 0.68 0.74 19
3 0.00 0.00 0.00 5
5 0.00 0.00 0.00 2
6 0.00 0.00 0.00 2
7 0.83 0.83 0.83 6
accuracy 0.63 43
macro avg 0.36 0.42 0.37 43
weighted avg 0.58 0.63 0.58 43
0.627906976744186
Model 7
precision recall f1-score support
1 0.50 0.89 0.64 9
2 0.76 0.84 0.80 19
3 0.00 0.00 0.00 5
5 0.00 0.00 0.00 2
6 0.00 0.00 0.00 2
7 0.83 0.83 0.83 6
accuracy 0.67 43
macro avg 0.35 0.43 0.38 43
weighted avg 0.56 0.67 0.60 43
0.6744186046511628
Model 8
precision recall f1-score support
1 0.00 0.00 0.00 9
2 0.53 0.89 0.67 19
3 0.00 0.00 0.00 5
5 0.00 0.00 0.00 2
6 0.00 0.00 0.00 2
7 0.67 1.00 0.80 6
accuracy 0.53 43
macro avg 0.20 0.32 0.24 43
weighted avg 0.33 0.53 0.41 43
0.5348837209302325
Model 9
precision recall f1-score support
1 0.29 1.00 0.45 9
2 1.00 0.11 0.19 19
3 0.00 0.00 0.00 5
5 0.50 1.00 0.67 2
6 0.00 0.00 0.00 2
7 0.83 0.83 0.83 6
accuracy 0.42 43
macro avg 0.44 0.49 0.36 43
weighted avg 0.64 0.42 0.33 43
0.4186046511627907
Model 10
precision recall f1-score support
1 0.41 0.78 0.54 9
2 1.00 0.53 0.69 19
3 0.50 0.40 0.44 5
5 0.29 1.00 0.44 2
6 0.00 0.00 0.00 2
7 0.80 0.67 0.73 6
accuracy 0.58 43
macro avg 0.50 0.56 0.47 43
weighted avg 0.71 0.58 0.59 43
0.5813953488372093
Model 11
precision recall f1-score support
1 0.29 1.00 0.45 9
2 0.50 0.05 0.10 19
3 0.00 0.00 0.00 5
5 0.50 0.50 0.50 2
6 0.00 0.00 0.00 2
7 0.86 1.00 0.92 6
accuracy 0.40 43
macro avg 0.36 0.43 0.33 43
weighted avg 0.42 0.40 0.29 43
0.3953488372093023
acc_1 = 0.703703*100
acc_2 = 0.685185*100
acc_3 = 0.666666*100
acc_4 = 0.759259*100
acc_5 = 0.537037*100
acc_6 = 0.648148*100
acc_7 = 0.796296*100
acc_8 = 0.814814*100
acc_9 = 0.592592*100
acc_10 = 0.61111*100
acc_11 = 0.81481*100
acc_12 = 0.64814*100
results = pd.DataFrame([["Logistic Regression",acc_1],["Nearest Neighbor",acc_2],["Support Vector Machine (Linear Classifier)",acc_3],
["Support Vector Machine (RBF Classifier)",acc_4],["Gaussian Naive Bayes",acc_5],["Decision Tree Classifier",acc_6],
["Random Forest Classifier",acc_7],["Xgboost Classifier",acc_8],["SGD Classifier ",acc_9],["AdaBoost Classifier",acc_10],
["GradientBoosting Classifier",acc_11],["Quadratic Discriminant Analysis",acc_12],
],columns = ["Models","Accuracy Score"]).sort_values(by='Accuracy Score',ascending=False)
results.style.background_gradient(cmap='Blues')
| Models | Accuracy Score | |
|---|---|---|
| 7 | Xgboost Classifier | 81.481400 |
| 10 | GradientBoosting Classifier | 81.481000 |
| 6 | Random Forest Classifier | 79.629600 |
| 3 | Support Vector Machine (RBF Classifier) | 75.925900 |
| 0 | Logistic Regression | 70.370300 |
| 1 | Nearest Neighbor | 68.518500 |
| 2 | Support Vector Machine (Linear Classifier) | 66.666600 |
| 5 | Decision Tree Classifier | 64.814800 |
| 11 | Quadratic Discriminant Analysis | 64.814000 |
| 9 | AdaBoost Classifier | 61.111000 |
| 8 | SGD Classifier | 59.259200 |
| 4 | Gaussian Naive Bayes | 53.703700 |